Take-home Exercise 1

0. Goals & Objectives.

On this page, we define Exploratory Spatial Data Analysis (ESDA) to understand spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.

Important Note

Due to the nature of EDA and Data Analysis, parts of this page have been Collapsed or placed behind tabs, to avoid excessive scrolling.

0.1 Motivation

Public transport is a key concern for residents in land-scarce, population-dense Singapore. With COEs reaching record highs and authorities announcing the termination of bus services, there is no better time to understand the public transport network and systems in Singapore.

By understanding and modelling patterns of public transport consumption - including specifically Local Indicators of Spatial Autocorrelation (LISA), insights can be provided to both public and private sector to formulate more informed decisions that for the benefit of the public, and not just for profit.

1. Geospatial Data Wrangling

This study was performed in R, written in R Studio, and published using Quarto.

1.1 Import Packages

This function calls pacman to load sf, tidyverse, tmap, knitr packages;

  • tmap : For thematic mapping; powerful mapping package
  • sf : for geospatial data handling, but also geoprocessing: buffer, point-in-polygon count, etc
    • batch processing over GIS packages; can handle tibble format
  • sfdep : creates space-time cube, EHSA; replaces spdep
  • tidyverse : for non-spatial data handling; commonly used R package and contains dplyr for dataframe manipulation
  • mapview : interactive plotting & manipulation
  • RColorBrewer : custom colour palettes for manipulation
pacman::p_load(tmap, sf, sfdep, tidyverse, mapview, RColorBrewer)

1.2 Import Data

1.2.1 Geospatial Load Bus Stop Data

  • First, we load BusStop shapefile data from LTA Datamall.
  • st_read() is used to import the ESRI Shapefile data into an sf dataframe
show code
raw_bus_stop_sf <- st_read(dsn = "data/geospatial", 
                 layer = "BusStop") 
Reading layer `BusStop' from data source 
  `C:\1darren\ISSS624\Take-home_Ex01\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
show code
head(raw_bus_stop_sf)
Simple feature collection with 6 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 13228.59 ymin: 30391.85 xmax: 41603.76 ymax: 44206.38
Projected CRS: SVY21
  BUS_STOP_N BUS_ROOF_N             LOC_DESC                  geometry
1      22069        B06   OPP CEVA LOGISTICS POINT (13576.31 32883.65)
2      32071        B23         AFT TRACK 13 POINT (13228.59 44206.38)
3      44331        B01              BLK 239  POINT (21045.1 40242.08)
4      96081        B05 GRACE INDEPENDENT CH POINT (41603.76 35413.11)
5      11561        B05              BLK 166 POINT (24568.74 30391.85)
6      66191        B03         AFT CORFE PL POINT (30951.58 38079.61)
  • We check the coordinate reference system with st_crs();
show code
st_crs(raw_bus_stop_sf)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
  • We see that although the projection is SVY21, the CRS (coordinate reference system) is EPSG 9001, which is incorrect;
  • We thus use st_set_crs() to set it to 3414, which is the EPSG code for SVY21
    • we then use st_crs() to confirm it is the correct EPSG code
show code
raw_bus_stop_sf <- st_set_crs(raw_bus_stop_sf, 3414)
Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
that
show code
st_crs(raw_bus_stop_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
  • We now do a quick plot to quickly visualize the bus stops;
    • qtm() is a wrapper to do a quick, simple plot using tmap without defining arguments
    • we use tmap_mode("plot") to set the map as a static image, rather than as an interactive map, in order to save processing.
show code
tmap_mode("plot")
tmap mode set to plotting
show code
qtm(raw_bus_stop_sf)

1.2.2 Visualizing within Singapore’s Administrative National Boundary

  • This image is hard to read; we can vaguely discern Singapore’s Southern coastline, but it can be hard to visualize
  • I have sourced and downloaded supplemental data about Singapore’s Administrative National Boundary (“SANB”) from igismap.com as a base layer for visualization;
    • We set tmap_mode("view") to allow us to scroll and confirm the SARB boundaries
    • As before, we use st_read() to load the shapefile data, and st_transform() to ensure the projection is correct
      • We use tm_shape() + tm_polygons() to map a grey layer of the SARB boundaries;
      • On top of which, we layer tm_shape() + tm_dots() to show the bus stops.
show code
sanb_sf <- st_read(dsn = "data/geospatial", 
                 layer = "singapore_administrative_national_boundary") %>%
  st_transform(crs = 3414)
Reading layer `singapore_administrative_national_boundary' from data source 
  `C:\1darren\ISSS624\Take-home_Ex01\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 1 feature and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6056 ymin: 1.158682 xmax: 104.4074 ymax: 1.471564
Geodetic CRS:  WGS 84
show code
tmap_mode("view")
tmap mode set to interactive viewing
show code
tm_shape(sanb_sf) +
  tm_polygons() + 
  tm_shape(raw_bus_stop_sf) + 
  tm_dots()
show code
## Additional data from: Data.gov.sg, https://beta.data.gov.sg/datasets/d_02cba6aeeed323b5f6c723527757c0bc/view
Does the map look skewed to the right?

That’s because the Singapore National Administrative Boundaries map includes Pedra Branca, the much-disputed outlying island and easternmost point of Singapore.

It’s an amusing artifact here, but will not be involved in further analysis later.

  • We note there are a number of bus stops outside Singapore’s boundaries; Specifically, three bus stops in a cluster just outside the Causeway, and one further North.
  • We perform several steps to isolate & check the data;
    • we use st_filter() to find bus stops within Singapore’s Administrative National Boundaries, and create sg_bus_stop_sf for future use
    • to check what bus stops have been dropped, we use anti_join() to compare raw_bus_stop_sf with sg_bus_stop_sf. We use st_drop_geometry on both sf dataframes to
show code
sg_bus_stop_sf <- st_filter(raw_bus_stop_sf, sanb_sf)
anti_join(st_drop_geometry(raw_bus_stop_sf), st_drop_geometry(sg_bus_stop_sf))
Joining with `by = join_by(BUS_STOP_N, BUS_ROOF_N, LOC_DESC)`
  BUS_STOP_N BUS_ROOF_N            LOC_DESC
1      47701        NIL          JB SENTRAL
2      46239         NA          LARKIN TER
3      46609         NA     KOTARAYA II TER
4      46211        NIL JOHOR BAHRU CHECKPT
5      46219        NIL JOHOR BAHRU CHECKPT
  • We see there are in fact 5 bus stops outside of Singapore (including the defunct Kotaraya II Terminal) that have been removed, which means our data cleaning was correct.

1.3 Geospatial Data Cleaning

1.3.1 Removing Duplicate Bus Stops

  • But, do we need to do more data cleaning?
  • We use length() to find the total number of raw values in the $BUS_STOP_N column of sg_bus_stop_sf
    • We then compare this to length(unique()) to find the unique values
  • And, indeed, we find there are 16 bus stops that are repeated;
cat("Total number of rows in sg_bus_stop_sf\t\t: ", paste0(length(sg_bus_stop_sf$BUS_STOP_N)))
Total number of rows in sg_bus_stop_sf      :  5156
cat("\nTotal unique bus stop IDs in sg_bus_stop_sf\t: ", paste0(length(unique(sg_bus_stop_sf$BUS_STOP_N))))

Total unique bus stop IDs in sg_bus_stop_sf :  5140
cat("\nRepeated bus stops\t\t\t\t:   ", paste0(length(raw_bus_stop_sf$BUS_STOP_N) - length(unique(raw_bus_stop_sf$BUS_STOP_N))))

Repeated bus stops              :    16
  • It appears there are 16 datapoints that are specifically repeated; let’s identify the bus stop numbers with repeated rows:
    • First we use filter() with a pipe mark (using or condition) to identify repeated bus stop numbers (i.e. $BUS_STOP_N)
    • We sort them in ascending order using arrange(); then, using group_by() and row_number() we add row numbers based on $BUS_STOP_N to a new column using mutate()
show code
repeated_df <- sg_bus_stop_sf %>%
  filter(duplicated(BUS_STOP_N) | duplicated(BUS_STOP_N, fromLast = TRUE)) %>% 
  arrange(BUS_STOP_N) %>%
  group_by(BUS_STOP_N) %>%
  mutate(RowNumber = row_number())

repeated_df
Simple feature collection with 32 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 13488.02 ymin: 32594.17 xmax: 44144.57 ymax: 47934
Projected CRS: SVY21 / Singapore TM
# A tibble: 32 × 5
# Groups:   BUS_STOP_N [16]
   BUS_STOP_N BUS_ROOF_N LOC_DESC                       geometry RowNumber
 * <chr>      <chr>      <chr>                       <POINT [m]>     <int>
 1 11009      B04        Ghim Moh Ter        (23101.34 32594.17)         1
 2 11009      B04-TMNL   GHIM MOH TER        (23100.58 32604.36)         2
 3 22501      B02        Blk 662A             (13489.09 35536.4)         1
 4 22501      B02        BLK 662A            (13488.02 35537.88)         2
 5 43709      B06        BLK 644              (18963.42 36762.8)         1
 6 43709      B06        BLK 644             (18952.02 36751.83)         2
 7 47201      UNK        <NA>                (22616.75 47793.68)         1
 8 47201      NIL        W'LANDS NTH STN        (22632.92 47934)         2
 9 51071      B21        MACRITCHIE RESERVO… (28311.27 36036.92)         1
10 51071      B21        MACRITCHIE RESERVO… (28282.54 36033.93)         2
# ℹ 22 more rows
  • From examination, there are 32 bus stops sharing 16 bus stop numbers – 16 pairs of bus stops sharing the same number.
  • Let’s examine these bus stop pairs on the map;
    • we use mapview() to display these repeated bus stops on the map
    • we use col="BUS_STOP_N" with palette="Spectral to give each pair of bus stops an individual colour
show code
tmap_mode("plot")
tmap mode set to plotting
show code
tm_shape(sanb_sf) +
  tm_polygons() + 
  tm_shape(repeated_df) + 
  tm_dots(col = "BUS_STOP_N", palette = "Spectral")

  • After confirming with Prof Kam, we will simply drop the second instance of the rows.
    • we use duplicated() to identify rows with repeated values of $BUS_STOP_N; duplicated rows will return TRUE while all other rows will return FALSE
    • We use ! to invert the values, so only the unduplicated rows will return TRUE.
    • We then use square brackets [] to index sg_bus_stop_sf based on the rows, and return only the unduplicated rows;
show code
sg_bus_stop_sf <- sg_bus_stop_sf[!duplicated(sg_bus_stop_sf$BUS_STOP_N), ]
head(sg_bus_stop_sf)
Simple feature collection with 6 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 13228.59 ymin: 30391.85 xmax: 41603.76 ymax: 44206.38
Projected CRS: SVY21 / Singapore TM
  BUS_STOP_N BUS_ROOF_N             LOC_DESC                  geometry
1      22069        B06   OPP CEVA LOGISTICS POINT (13576.31 32883.65)
2      32071        B23         AFT TRACK 13 POINT (13228.59 44206.38)
3      44331        B01              BLK 239  POINT (21045.1 40242.08)
4      96081        B05 GRACE INDEPENDENT CH POINT (41603.76 35413.11)
5      11561        B05              BLK 166 POINT (24568.74 30391.85)
6      66191        B03         AFT CORFE PL POINT (30951.58 38079.61)

::: {.callout-warning collapse=“true”} ## !LONG! Alternative manual cleaning steps;

I was unable to take Prof Kam’s Data Analytics Lab, but I know of his fervour and attention to detail. I believe in informed choices, and so I performed manual cleaning with the following steps: - Remove the rows with lowercase names, as most $LOC_DESC are in strict uppercase - Remove the rows with NA $LOC_DESC - Remove the row with NIL $LOC_DESC - For remaining rows, drop second entry - Retain remaining rows - After this, we just run the same steps on sg_bus_stop_sf or perform an anti-join.

However, after clarification with Prof Kam, we just drop the second entry. My code is shown below for extra gratification

show code
drop_second_stop = c("43709", "51071", "53041", "52059", "58031", "68091", "68099", "97079")
rows_to_retain_df <- repeated_df %>%
  filter(
    case_when(
      BUS_STOP_N == "11009" & grepl("[a-z]", LOC_DESC) ~ FALSE,
      BUS_STOP_N == "22501" & grepl("[a-z]", LOC_DESC) ~ FALSE,
      BUS_STOP_N == "77329" & grepl("[a-z]", LOC_DESC) ~ FALSE,
      BUS_STOP_N == "82221" & grepl("[a-z]", LOC_DESC) ~ FALSE,
      BUS_STOP_N == "62251" & grepl("[a-z]", LOC_DESC) ~ FALSE,
      BUS_STOP_N == "96319" & grepl("[a-z]", LOC_DESC) ~ FALSE,

      BUS_STOP_N == "47201" & is.na(LOC_DESC) ~ FALSE,

      BUS_STOP_N == "67421" & BUS_ROOF_N == "NIL" ~ FALSE,
      BUS_STOP_N %in% drop_second_stop & RowNumber == 2 ~ FALSE,

      TRUE ~ TRUE
    )
  )

rows_to_retain_df$LOC_DESC  = toupper(rows_to_retain_df$LOC_DESC)

print("Printing rows to retain:")
[1] "Printing rows to retain:"
show code
rows_to_retain_df
Simple feature collection with 16 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 13488.02 ymin: 32604.36 xmax: 44144.57 ymax: 47934
Projected CRS: SVY21 / Singapore TM
# A tibble: 16 × 5
# Groups:   BUS_STOP_N [16]
   BUS_STOP_N BUS_ROOF_N LOC_DESC                       geometry RowNumber
 * <chr>      <chr>      <chr>                       <POINT [m]>     <int>
 1 11009      B04-TMNL   GHIM MOH TER        (23100.58 32604.36)         2
 2 22501      B02        BLK 662A            (13488.02 35537.88)         2
 3 43709      B06        BLK 644              (18963.42 36762.8)         1
 4 47201      NIL        W'LANDS NTH STN        (22632.92 47934)         2
 5 51071      B21        MACRITCHIE RESERVO… (28311.27 36036.92)         1
 6 52059      B03        OPP BLK 65           (30770.3 34460.06)         1
 7 53041      B05        UPP THOMSON ROAD     (28105.8 37246.76)         1
 8 58031      UNK        OPP CANBERRA DR      (27089.69 47570.9)         1
 9 62251      B03        BEF BLK 471B        (35500.36 39943.34)         2
10 67421      B01        CHENG LIM STN EXIT… (34548.54 42052.15)         1
11 68091      B01        AFT BAKER ST        (32164.11 42695.98)         1
12 68099      B02        BEF BAKER ST         (32154.9 42742.82)         1
13 77329      B01        BEF PASIR RIS ST 53 (40765.35 39452.18)         1
14 82221      B01        BLK 3A               (35323.6 33257.05)         1
15 96319      NIL        YUSEN LOGISTICS     (42187.23 34995.78)         2
16 97079      B14        OPP ST. JOHN'S CRES (44144.57 38980.25)         1
  • If we run the steps from above, we can see that there are no repeated bus stops.
cat("Total number of rows in sg_bus_stop_sf\t\t: ", paste0(length(sg_bus_stop_sf$BUS_STOP_N)))
Total number of rows in sg_bus_stop_sf      :  5140
cat("\nTotal unique bus stop IDs in sg_bus_stop_sf\t: ", paste0(length(unique(sg_bus_stop_sf$BUS_STOP_N))))

Total unique bus stop IDs in sg_bus_stop_sf :  5140
cat("\nRepeated bus stops\t\t\t\t:   ", paste0(length(sg_bus_stop_sf$BUS_STOP_N) - length(unique(sg_bus_stop_sf$BUS_STOP_N))))

Repeated bus stops              :    0

1.4 Generating Hex Maps

  • We use st_make_grid() with square = FALSE to create the hexagon layer as defined in the study, which we name raw_hex_grid
    • We pass cellsize = 500 to create the hexagons of necessary size. Prof Kam defined the apothem as 250m, which means the distance between opposite edges is double that, or 500m
      • I used units::as_units to pass 500 metres into the argument. I am still uncertain whether a length of 500m needs to be reprojected, or whether we need to do any further transformation.
  • We use st_sf() to convert raw_hex_grid into an sf dataframe
    • mutate() is used here to create a grid_id column
    • We just use st_transform() in case we need to reproject the coordinate system, just in case.
  • However, trying to visualize this right now just gives us a map full of hexagons.
show code
raw_hex_grid = st_make_grid(sg_bus_stop_sf, cellsize = units::as_units(500, "m"), what = "polygons", square = FALSE) %>%
  st_transform(crs = 3414)
# To sf and add grid ID
raw_hex_grid <- st_sf(raw_hex_grid) %>%
  # add grid ID
  mutate(grid_id = 1:length(lengths(raw_hex_grid))) %>%
  st_transform(crs = 3414)

tmap_mode("plot")
tmap mode set to plotting
show code
qtm(raw_hex_grid)

  • What we will need is to isolate only the hexes with bus stops in them.
    • We use lengths(st_intersects()) to count the number of bus stops in each hex, and add that to a new column, $n_bus_stops
    • We then create hexagon_sf by filtering for only hexes with non-zero bus stops in each.
  • We then plot these using the usual tmap() functions;
    • tm_basemap()is used to create a “basemap” layer underneath to orient our hexes.
    • We used OneMapSG as our comprehensive map of Singapore; if you zoom in, you can actually count the number of bus stops in red on the map.
show code
# Count number of points in each grid, code snippet referenced from: 
# https://gis.stackexchange.com/questions/323698/counting-points-in-polygons-with-sf-package-of-r

raw_hex_grid$n_bus_stops = lengths(st_intersects(raw_hex_grid, sg_bus_stop_sf))

# remove grid without value of 0 (i.e. no points inside that grid)
hexagon_sf = filter(raw_hex_grid, n_bus_stops > 0)
# head(hexagon_sf)

tmap_mode("view")
tmap mode set to interactive viewing
show code
tm_basemap(providers$OneMapSG.Grey) + 
  tm_shape(hexagon_sf) +
  tm_fill(
    col = "n_bus_stops",
    palette = "-plasma",
    style = "cont",
    
    breaks = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
    title = "Number of bus_stops",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.vars = c(
      "Number of Bus Stops: " = "n_bus_stops"
    ),
    popup.format = list(
      n_bus_stops = list(format = "f", digits = 0)
    )
  ) +
  tm_borders(col = "grey40", lwd = 0.7)
  • There seem to be 2 regions of deep purple, centred over an area near, but not exactly over Pasir Ris and Choa Chu Kang MRTs.
  • We perform some simple stats to count the total number of filtered hexes, and to see the maximum number of bus stops in a hex.
show code
cat(paste("Total number of raw hexes is\t\t\t: ", nrow(raw_hex_grid), "\n"))
Total number of raw hexes is            :  5040 
show code
cat(paste("Total number of hexes (n_bus_stop > 1) is\t: ", nrow(hexagon_sf)), "\n")
Total number of hexes (n_bus_stop > 1) is   :  1520 
show code
cat("\nPrinting map_hexagon_sf:\n >> ")

Printing map_hexagon_sf:
 >> 
show code
hexagon_sf[hexagon_sf$n_bus_stops > 10, ]
Simple feature collection with 2 features and 2 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 17470.12 ymin: 38317.78 xmax: 42720.12 ymax: 40194.17
Projected CRS: SVY21 / Singapore TM
                       raw_hex_grid grid_id n_bus_stops
304  POLYGON ((17720.12 39616.82...    1584          11
1462 POLYGON ((42470.12 38317.78...    4355          11

0.3.1 Load Bus Trip Data

show code
# odbus_2308 <- read_csv("data/aspatial/origin_destination_bus_202308.csv")
# 
# # Drop pt_type
# odbus_2308 <- select(odbus_2308, -PT_TYPE)
# # alternative way in read_csv df <- read_csv("data/aspatial/origin_destination_bus_202308.csv", col_types = "ccdcffd")
# 
# odbus_2308$ORIGIN_PT_CODE <- as.factor(odbus_2308$ORIGIN_PT_CODE)
# odbus_2308$DESTINATION_PT_CODE <- as.factor(odbus_2308$DESTINATION_PT_CODE) 
# glimpse(odbus_2308)
  • sanity check number of distinct bus stops over months
# odbus_2309 <- read_csv("data/aspatial/origin_destination_bus_202309.csv")
odbus_2310 <- read_csv("data/aspatial/origin_destination_bus_202310.csv")
Rows: 5694297 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): YEAR_MONTH, DAY_TYPE, PT_TYPE, ORIGIN_PT_CODE, DESTINATION_PT_CODE
dbl (2): TIME_PER_HOUR, TOTAL_TRIPS

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Drop pt_type
# odbus_2309 <- select(odbus_2309, -PT_TYPE)
odbus_2310 <- select(odbus_2310, -PT_TYPE)
# alternative way in read_csv df <- read_csv("data/aspatial/origin_destination_bus_202308.csv", col_types = "ccdcffd")

# odbus_2309$ORIGIN_PT_CODE <- as.factor(odbus_2309$ORIGIN_PT_CODE)
odbus_2310$ORIGIN_PT_CODE <- as.factor(odbus_2310$ORIGIN_PT_CODE)



# cat("Confirm distinct origin bus stops 23-08: \n>> ", paste(length(unique(odbus_2308$ORIGIN_PT_CODE))))
# cat("Confirm distinct origin bus stops 23-09: \n>> ", paste(length(unique(odbus_2309$ORIGIN_PT_CODE))))
cat("Confirm distinct origin bus stops 23-10: \n>> ", paste(length(unique(odbus_2310$ORIGIN_PT_CODE))))
Confirm distinct origin bus stops 23-10: 
>>  5073

First we split out weekday and then perform mutate to group by

# odbus_2310_weekday = odbus_2310 %>%
#  filter(DAY_TYPE == "WEEKDAY") %>%
#  mutate(PEAK = case_when(
#    TIME_PER_HOUR >= 6 &  TIME_PER_HOUR <= 9 ~ "MORNING PEAK",
#    TIME_PER_HOUR >= 17 &  TIME_PER_HOUR <= 20 ~ "AFTEROON PEAK",
#    TRUE ~ "Unknown"
#  ))
#cat("Confirm $DAY_TYPE only has `WEEKDAY` value: \n>> ", paste(unique(odbus_2310_weekday$DAY_TYPE)))
#odbus_2310_weekday



odbus_filtered <- odbus_2310 %>%
  mutate(PEAK = case_when(
    DAY_TYPE == "WEEKDAY" & TIME_PER_HOUR >= 6 &  TIME_PER_HOUR <= 9 ~ "WEEKDAY MORNING",
    DAY_TYPE == "WEEKDAY" & TIME_PER_HOUR >= 17 &  TIME_PER_HOUR <= 20 ~ "WEEKDAY AFTERNOON",
    DAY_TYPE == "WEEKENDS/HOLIDAY" & TIME_PER_HOUR >= 11 &  TIME_PER_HOUR <= 14 ~ "WEEKEND MORNING",
    DAY_TYPE == "WEEKENDS/HOLIDAY" & TIME_PER_HOUR >= 16 &  TIME_PER_HOUR <= 19 ~ "WEEKEND EVENING",
    TRUE ~ "Unknown"
  )) %>%
  filter(
    case_when(
      PEAK == "Unknown" ~ FALSE,
      TRUE ~ TRUE
    )) %>%
  group_by(ORIGIN_PT_CODE, PEAK) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
`summarise()` has grouped output by 'ORIGIN_PT_CODE'. You can override using
the `.groups` argument.
odbus_filtered <- odbus_filtered %>%
  pivot_wider(names_from = PEAK, values_from = TRIPS, values_fill = 0)


write_rds(odbus_filtered, "data/rds/odbus_filtered.rds")
head(odbus_filtered)
# A tibble: 6 × 5
# Groups:   ORIGIN_PT_CODE [6]
  ORIGIN_PT_CODE `WEEKDAY AFTERNOON` `WEEKDAY MORNING` `WEEKEND EVENING`
  <fct>                        <dbl>             <dbl>             <dbl>
1 01012                         8000              1770              3061
2 01013                         7038               841              2770
3 01019                         3398              1530              1685
4 01029                         9089              2483              4063
5 01039                        12095              2919              7263
6 01059                         2212              1734              1118
# ℹ 1 more variable: `WEEKEND MORNING` <dbl>

Sanity check, number of distinct bus stops?

cat("Confirm distinct origin bus stops: \n>> ", paste(length(unique(odbus_2310$ORIGIN_PT_CODE))))
Confirm distinct origin bus stops: 
>>  5073
cat("\nConfirm distinct destination stops: \n>> ", paste(length(unique(odbus_2310$DESTINATION_PT_CODE ))))

Confirm distinct destination stops: 
>>  5077
cat("\nConfirm distinct bus stops in `bus_stop_sf`: \n>> ", paste(length(unique(sg_bus_stop_sf$BUS_STOP_N))))

Confirm distinct bus stops in `bus_stop_sf`: 
>>  5140
# odbus_2310_copy <- odbus_2310 %>%
#   rename(BUS_STOP_N = ORIGIN_PT_CODE)
# 
# non_hexagon_sf <- anti_join(bus_stop_sf, odbus_2310_copy, by = "BUS_STOP_N")
# 
# cat("\nConfirm distinct bus stops in `non_hexagon_sf`: \n>> ", paste(length(unique(non_hexagon_sf$BUS_STOP_N  ))))
# 
# mapview_non_hexag = mapview(non_hexagon_sf, cex = 3, alpha = .5, popup = NULL)
# mapview_non_hexag
# 
# distinct_odbus <- distinct(select(odbus_2310, ORIGIN_PT_CODE)) #5073
# distinct_busstop <- distinct(select(bus_stop_sf, BUS_STOP_N)%>%
#   st_drop_geometry()) #5145
# 
# anti_join(distinct_odbus, distinct_busstop, by=c("ORIGIN_PT_CODE" ="BUS_STOP_N")) # 60
# anti_join(distinct_busstop, distinct_odbus, by=c("BUS_STOP_N" = "ORIGIN_PT_CODE")) #132
  • Test to reproduce 01013 bus stop with 841 trips on weekday mornings
show code
subset_df <- odbus_2310 %>%
  filter(ORIGIN_PT_CODE == '01013') %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6) %>%
  filter(TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE, DAY_TYPE, TIME_PER_HOUR, YEAR_MONTH) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS), .groups = 'keep')
head(subset_df)
# A tibble: 4 × 5
# Groups:   ORIGIN_PT_CODE, DAY_TYPE, TIME_PER_HOUR, YEAR_MONTH [4]
  ORIGIN_PT_CODE DAY_TYPE TIME_PER_HOUR YEAR_MONTH TRIPS
  <fct>          <chr>            <dbl> <chr>      <dbl>
1 01013          WEEKDAY              6 2023-10      180
2 01013          WEEKDAY              7 2023-10      138
3 01013          WEEKDAY              8 2023-10      254
4 01013          WEEKDAY              9 2023-10      269
show code
180 + 138 + 254 + 269
[1] 841

Create bus_stop_hexgrid_id

  • Combine hexagon_sf with bus_stop_sf
show code
bus_stop_hexgrid_id <- st_intersection(sg_bus_stop_sf, hexagon_sf) %>%
  select(grid_id, BUS_STOP_N) %>%
  st_drop_geometry()
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
show code
cat("\nNumber of bus stops\t:", length(unique(bus_stop_hexgrid_id$BUS_STOP_N)))

Number of bus stops : 5140
show code
cat("\nNumber of hexgrids\t:", length(unique(bus_stop_hexgrid_id$grid_id)))

Number of hexgrids  : 1520
show code
head(bus_stop_hexgrid_id)
     grid_id BUS_STOP_N
3265      31      25059
2566      59      25751
254       90      26379
2893     115      25761
2823     117      25719
4199     117      26389

Combine bus_stop_hexgrid_id with trip details

  • Combine bus_stop_hexgrid_id with odbus_filtered
show code
grid_trips_df <- left_join(odbus_filtered, bus_stop_hexgrid_id, 
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  select(grid_id, 
         `WEEKDAY MORNING`,
         `WEEKDAY AFTERNOON`,
         `WEEKEND MORNING`,
         `WEEKEND EVENING`)  %>%
  group_by(grid_id) %>%
  summarise(
    WEEKDAY_MORNING_TRIPS = sum(`WEEKDAY MORNING`), 
    WEEKDAY_AFTERNOON_TRIPS = sum(`WEEKDAY AFTERNOON`), 
    WEEKEND_MORNING_TRIPS = sum(`WEEKEND MORNING`), 
    WEEKEND_EVENING_TRIPS = sum(`WEEKEND EVENING`)
    )
Adding missing grouping variables: `ORIGIN_PT_CODE`
show code
head(grid_trips_df)
# A tibble: 6 × 5
  grid_id WEEKDAY_MORNING_TRIPS WEEKDAY_AFTERNOON_TRIPS WEEKEND_MORNING_TRIPS
    <int>                 <dbl>                   <dbl>                 <dbl>
1      31                   103                     390                     0
2      59                    52                     114                    26
3      90                    78                     291                    52
4     115                   185                    1905                   187
5     117                  1116                    2899                   455
6     118                    53                     241                    76
# ℹ 1 more variable: WEEKEND_EVENING_TRIPS <dbl>

Left join back to hexagon_df

show code
hexagon_sf_2 <- left_join(hexagon_sf, grid_trips_df, 
            by = 'grid_id' ) %>%
  mutate(
    WEEKDAY_MORNING_TRIPS = ifelse(is.na(WEEKDAY_MORNING_TRIPS), 0, WEEKDAY_MORNING_TRIPS),
    WEEKDAY_AFTERNOON_TRIPS = ifelse(is.na(WEEKDAY_AFTERNOON_TRIPS), 0, WEEKDAY_AFTERNOON_TRIPS),
    WEEKEND_MORNING_TRIPS = ifelse(is.na(WEEKEND_MORNING_TRIPS), 0, WEEKEND_MORNING_TRIPS),
    WEEKEND_EVENING_TRIPS = ifelse(is.na(WEEKEND_EVENING_TRIPS), 0, WEEKEND_EVENING_TRIPS),
         )

head(hexagon_sf_2)
Simple feature collection with 6 features and 6 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 3720.122 ymin: 27925.48 xmax: 4970.122 ymax: 31533.92
Projected CRS: SVY21 / Singapore TM
  grid_id n_bus_stops WEEKDAY_MORNING_TRIPS WEEKDAY_AFTERNOON_TRIPS
1      31           1                   103                     390
2      59           1                    52                     114
3      90           1                    78                     291
4     115           1                   185                    1905
5     117           2                  1116                    2899
6     118           1                    53                     241
  WEEKEND_MORNING_TRIPS WEEKEND_EVENING_TRIPS                   raw_hex_grid
1                     0                    56 POLYGON ((3970.122 27925.48...
2                    26                    14 POLYGON ((4220.122 28358.49...
3                    52                   100 POLYGON ((4470.122 30523.55...
4                   187                   346 POLYGON ((4720.122 28358.49...
5                   455                   634 POLYGON ((4720.122 30090.54...
6                    76                    55 POLYGON ((4720.122 30956.57...
  • Now to try to do data analysis
  • In my defense, I emailed prof kam for DAL notes but he ignroed me for 6 months
show code
summary(hexagon_sf_2$WEEKDAY_MORNING_TRIPS)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0     919    7566   16812   23245  386450 
show code
hist(hexagon_sf_2$WEEKDAY_MORNING_TRIPS, 
     main = "Histogram Example", 
     xlab = "WEEKDAY_MORNING_TRIPS", 
     col = "lightblue", 
     border = "black")

show code
# Load the ggplot2 package
library(ggplot2)


trip_col_names <- c("WEEKDAY_MORNING_TRIPS", "WEEKDAY_AFTERNOON_TRIPS", "WEEKEND_MORNING_TRIPS", "WEEKEND_EVENING_TRIPS")

par(mfrow = c(2, 2))  # Set up a 2x2 layout
custom_breaks <- seq(0, 550000, by = 50000)

for (col in trip_col_names) {
  hist(hexagon_sf_2[[col]], main = col, col = "lightblue", border = "black",
       breaks = custom_breaks)
}

show code
colnames(hexagon_sf_2)
[1] "grid_id"                 "n_bus_stops"            
[3] "WEEKDAY_MORNING_TRIPS"   "WEEKDAY_AFTERNOON_TRIPS"
[5] "WEEKEND_MORNING_TRIPS"   "WEEKEND_EVENING_TRIPS"  
[7] "raw_hex_grid"           

Map hexes on SIngapore

show code
sarb_sf <- st_read(dsn = "data/geospatial", 
                 layer = "Subzone_Census2010") %>%
  st_transform(crs = 3414)
Reading layer `Subzone_Census2010' from data source 
  `C:\1darren\ISSS624\Take-home_Ex01\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 311 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2637.869 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
show code
tmap_options(check.and.fix = TRUE)
qtm(sarb_sf)
Warning: The shape sarb_sf is invalid (after reprojection). See sf::st_is_valid
show code
## Additional data from: Data.gov.sg, https://beta.data.gov.sg/datasets/d_02cba6aeeed323b5f6c723527757c0bc/view
show code
tm_shape(sarb_sf) +
  tm_polygons() + 
  tm_shape(hexagon_sf_2) + 
  tm_fill("WEEKDAY_MORNING_TRIPS", 
          style = "quantile", 
          palette = "Blues",
          title = "Number of trips")
Warning: The shape sarb_sf is invalid (after reprojection). See sf::st_is_valid
show code
tm_shape(sarb_sf) +
  tm_polygons() + 
  tm_shape(hexagon_sf_2) + 
  tm_fill("WEEKDAY_AFTERNOON_TRIPS", 
          style = "quantile", 
          palette = "Greens",
          title = "Dependency ratio")
Warning: The shape sarb_sf is invalid (after reprojection). See sf::st_is_valid
tm_shape(sarb_sf) +
  tm_polygons() + 
  tm_shape(hexagon_sf_2) + 
  tm_fill("WEEKEND_MORNING_TRIPS", 
          style = "quantile", 
          palette = "Reds",
          title = "Dependency ratio")
Warning: The shape sarb_sf is invalid (after reprojection). See sf::st_is_valid
show code
tm_shape(sarb_sf) +
  tm_polygons() + 
  tm_shape(hexagon_sf_2) + 
  tm_fill("WEEKEND_EVENING_TRIPS", 
          style = "quantile", 
          palette = "Oranges",
          title = "Dependency ratio")
Warning: The shape sarb_sf is invalid (after reprojection). See sf::st_is_valid
show code
# tm_shape(sarb_sf) +
#   tm_polygons() + 
#   tm_shape(hexagon_sf_2) + 
#   tm_fill("WEEKDAY_MORNING_TRIPS", 
#           style = "cont", 
#           palette = "viridis",
#           breaks = custom_breaks,
#           title = "Dependency ratio")
# tm_shape(sarb_sf) +
#   tm_polygons() + 
#   tm_shape(hexagon_sf_2) + 
#   tm_fill("WEEKDAY_AFTERNOON_TRIPS", 
#           style = "cont", 
#           palette = "viridis",
#           breaks = custom_breaks,
#           title = "Dependency ratio")
# tm_shape(sarb_sf) +
#   tm_polygons() + 
#   tm_shape(hexagon_sf_2) + 
#   tm_fill("WEEKEND_MORNING_TRIPS", 
#           style = "cont", 
#           palette = "viridis",
#           breaks = custom_breaks,
#           title = "Dependency ratio")
# tm_shape(sarb_sf) +
#   tm_polygons() + 
#   tm_shape(hexagon_sf_2) + 
#   tm_fill("WEEKEND_EVENING_TRIPS", 
#           style = "cont", 
#           palette = "viridis",
#           breaks = custom_breaks,
#           title = "Dependency ratio")

Plot Weekday Morning

show code
# mapview(hexagon_sf_2, zcol="WEEKDAY_MORNING_TRIPS", cex = 3, alpha = .5, popup = NULL)

Plot Weekday Afternoons

show code
# mapview(hexagon_sf_2, zcol="WEEKDAY_AFTERNOON_TRIPS", cex = 3, alpha = .5, popup = NULL)

Plot Weekend Morning

show code
# mapview(hexagon_sf_2, zcol="WEEKEND_MORNING_TRIPS", cex = 3, alpha = .5, popup = NULL)

Plot Weekend Afternoons

show code
# mapview(hexagon_sf_2, zcol="WEEKEND_EVENING_TRIPS", cex = 3, alpha = .5, popup = NULL)
Quiz: What statistical conclusion can you draw from the output above?
  • Using both fixed- and adaptive-distance neigbours generates similar results; values are largely in agreement
  • knn=8 feels like it’s not very logical, but maybe there’s not too much difference

2. Create LISA

Create Queen contiguity weight matrix hex

show code
wm_hex <- st_contiguity(hexagon_sf_2, queen=TRUE)
summary(wm_hex)
Neighbour list object:
Number of regions: 1520 
Number of nonzero links: 6874 
Percentage nonzero weights: 0.2975242 
Average number of links: 4.522368 
9 regions with no links:
561 726 980 1047 1415 1505 1508 1512 1520
Link number distribution:

  0   1   2   3   4   5   6 
  9  39 109 205 291 364 503 
39 least connected regions:
1 7 22 38 98 169 187 195 211 218 258 259 264 267 287 454 562 607 642 696 708 732 751 784 869 1021 1022 1046 1086 1214 1464 1471 1482 1500 1501 1503 1506 1510 1519 with 1 link
503 most connected regions:
10 13 16 17 24 25 31 35 42 43 48 53 55 60 63 67 73 77 80 81 84 85 87 88 91 92 97 102 107 111 117 121 127 133 140 141 143 148 149 150 154 155 156 157 163 164 165 173 174 175 183 184 185 191 192 193 194 200 201 202 205 206 207 208 216 229 239 243 244 246 257 266 271 278 279 283 284 291 292 298 300 301 302 304 309 310 312 313 316 321 324 325 327 337 338 339 340 343 352 355 363 368 390 391 400 402 403 407 414 418 423 425 431 436 437 438 440 443 450 451 452 461 466 467 468 469 473 477 480 481 485 489 493 494 496 502 503 507 513 514 517 518 523 529 534 539 543 548 549 550 552 556 558 564 568 573 574 576 577 581 590 591 594 598 599 604 605 609 615 619 624 626 633 636 637 638 648 649 650 654 655 657 658 659 669 670 671 677 680 681 682 687 688 690 691 700 701 704 705 706 713 716 717 724 727 728 729 740 741 755 757 758 760 771 774 775 776 777 782 783 787 788 789 792 793 794 795 799 800 806 807 810 811 812 813 819 820 823 824 825 830 831 832 841 843 844 846 847 848 850 851 852 853 854 860 863 865 866 867 871 872 876 877 878 880 881 882 884 885 887 888 891 893 896 899 902 905 906 910 914 919 921 926 927 928 930 931 935 937 943 944 945 946 947 948 954 958 959 962 963 968 969 971 972 973 977 984 985 986 987 988 990 996 997 998 999 1004 1011 1012 1013 1014 1024 1025 1026 1028 1029 1036 1037 1038 1042 1050 1051 1054 1056 1057 1062 1063 1064 1066 1067 1068 1069 1076 1078 1079 1080 1083 1089 1093 1100 1101 1102 1105 1106 1110 1111 1117 1120 1121 1122 1128 1133 1134 1135 1136 1141 1142 1144 1145 1146 1147 1148 1150 1156 1157 1158 1162 1163 1164 1166 1169 1170 1171 1172 1176 1177 1178 1179 1180 1184 1186 1190 1191 1192 1193 1194 1201 1202 1203 1204 1205 1206 1207 1210 1211 1217 1218 1219 1220 1221 1227 1233 1234 1235 1239 1244 1245 1251 1253 1254 1255 1261 1265 1266 1271 1272 1273 1277 1281 1283 1289 1299 1301 1302 1303 1304 1316 1318 1324 1325 1326 1327 1329 1330 1331 1334 1335 1336 1337 1343 1344 1345 1352 1353 1355 1356 1361 1365 1366 1368 1369 1371 1372 1377 1380 1381 1382 1384 1388 1391 1393 1395 1398 1406 1408 1412 1417 1418 1420 1424 1425 1426 1427 1428 1433 1434 1435 1436 1440 1441 1442 1446 1447 1449 1451 1453 1456 1457 1459 1460 1461 1462 1469 with 6 links

Create Queen contiguity

show code
hex_no_nb <- c(307, 565, 730, 984, 1051, 1419, 1509, 1512, 1516, 1524)
hexagon_sf_2_drop_nonb <- hexagon_sf_2 %>%
  mutate(RowNumber = row_number()) %>%
  subset( !(RowNumber  %in% hex_no_nb))

vis_excluded_hexes <- tm_shape(sarb_sf) +
  tm_polygons() + 
  tm_shape(hexagon_sf_2) + 
  tm_fill("red") + 
  tm_shape(hexagon_sf_2_drop_nonb) + 
  tm_fill("green")
vis_excluded_hexes
Warning: The shape sarb_sf is invalid (after reprojection). See sf::st_is_valid

Drop JB Bus Stop

Find Northernmost point: JB Bus stop

show code
# coordinates <- st_coordinates(raw_bus_stop_sf$geometry)
# 
# # Find the index of the northernmost point
# northernmost_index <- which.max(coordinates[, 2])
# 
# # Extract the northernmost point
# northernmost_point <- raw_bus_stop_sf[northernmost_index, ]
# northernmost_point # 46239

Identify the hex to drop

show code
# jb_grid_id <- pull(bus_stop_hexgrid_id[bus_stop_hexgrid_id$BUS_STOP_N == 46239, 'grid_id']) #1767
# hexagon_sf_2_drop_jb <- 
#   subset(hexagon_sf_2, !(grid_id == jb_grid_id))

Now we figure out distance; first find centroid

show code
# longitude <- map_dbl(hexagon_sf_2_drop_jb$raw_hex_grid, ~st_centroid(.x)[[1]])
# latitude <- map_dbl(hexagon_sf_2_drop_jb$raw_hex_grid, ~st_centroid(.x)[[2]])
# coords <- cbind(longitude, latitude)
# # cat("Printing first 6 rows of `coords`:\n")
# # head(coords)
# 
# 
# k1_nn_obj <- st_knn(coords, k = 1)
# k1dists <- unlist(st_nb_dists(coords, k1_nn_obj))
# summary(k1dists)
  • 2km is a huge distance – we would enmesh hexes to about 4 hexes away
  • bus stops are
show code
# check_dist <- st_nb_dists(coords, k1_nn_obj)
# which.max(check_dist) # 1523
# 
# 
# tm_shape(sarb_sf) +
#   tm_polygons() + 
#   tm_shape(hexagon_sf_2_drop_jb) + 
#   tm_fill("cyan") + 
#   tm_shape(hexagon_sf_2_drop_jb[1523,]) + 
#   tm_fill("orange")

Find Northernmost point: Changi Naval Base Point

show code
# coordinates <- st_coordinates(raw_bus_stop_sf$geometry)
# 
# 
# 
# # Find the index of the northernmost point
# easternmost_index <- which.max(coordinates[, 1])
# 
# # Extract the northernmost point
# easternmost_point <- raw_bus_stop_sf[easternmost_index, ]
# easternmost_point # 96439        

Identify the hex to drop

show code
# naval_base_grid_id <- pull(bus_stop_hexgrid_id[bus_stop_hexgrid_id$BUS_STOP_N == 96439, 'grid_id']) #1767
# hexagon_sf_2_drop_changi <- 
#   subset(hexagon_sf_2_drop_jb, !(grid_id == naval_base_grid_id))
# 
# longitude_2 <- map_dbl(hexagon_sf_2_drop_changi$raw_hex_grid, ~st_centroid(.x)[[1]])
# latitude_2 <- map_dbl(hexagon_sf_2_drop_changi$raw_hex_grid, ~st_centroid(.x)[[2]])
# coords_2 <- cbind(longitude_2, latitude_2)
# k1_nn_obj_2 <- st_knn(coords_2, k = 1)
# k1dists_2 <- unlist(st_nb_dists(coords_2, k1_nn_obj_2))
# summary(k1dists_2)
  • A much more manageable 1km; two hexes away. Let’s work with this.

Now we figure out distance; first find centroid

show code
# hex_1km_nb <- st_dist_band(coords_2, lower = 0, upper = 1000.1)
# head(hex_1km_nb, 5)
# 
# hex_1km_wt <- st_weights(hex_1km_nb)
# 
# head(hex_1km_wt, 5)

Calculate Local Moran’s I

show code
# localMI_day_morn <- local_moran(hexagon_sf_2_drop_changi$WEEKDAY_MORNING_TRIPS , hex_1km_nb, hex_1km_wt)
# 
# hexagon_sf_local_moran <- cbind(hexagon_sf_2_drop_changi,localMI_day_morn) 

Identify the hex to drop

show code
# local_moran_stats <- tm_shape(sarb_sf) +
#   tm_polygons() +
#   tm_shape(hexagon_sf_local_moran) +
#   tm_fill(col = "ii", 
#           style = "pretty",
#           palette = "RdBu",
#           title = "Local Moran Statistics") +
#   tm_borders(alpha = 0.5) +
#   tm_layout(main.title = "Local Moran Statistics")
# 
# local_moran_p <- tm_shape(sarb_sf) +
#   tm_polygons() +
#   tm_shape(hexagon_sf_local_moran) +
#   tm_fill(col = "p_ii", 
#           breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
#           palette="-Greens", 
#           title = "local Moran's I p-values") +
#   tm_borders(alpha = 0.5) +
#   tm_layout(main.title = "Local Moran's I p-values")
# 
# tmap_arrange(local_moran_stats, 
#              local_moran_p, 
#              asp=1, 
#              ncol=2)
  • But we only want to see the ones with p < 0.05, so we need to do some cleaning
show code
# hexagon_sf_local_moran_sig_only <- hexagon_sf_local_moran %>%
#   subset( !(p_ii  < 0.05))
# 
# tm_shape(sarb_sf) +
#   tm_polygons() +
#   tm_shape(hexagon_sf_local_moran) +
#   tm_fill("darkgrey") +
#   tm_shape(hexagon_sf_local_moran_sig_only) +
#   tm_fill(col = "ii", 
#           style = "pretty",
#           palette = "RdBu",
#           title = "Local Moran Statistics") +
#   tm_borders(alpha = 0.5) +
#   tm_layout(main.title = "Local Moran Statistics")

Moran Scatterplot

show code
# Create lag


# hexagon_sf_local_moran <- hexagon_sf_local_moran %>%
#   mutate(trips_lag = st_lag(hexagon_sf_local_moran$WEEKDAY_MORNING_TRIPS, hex_1km_nb, hex_1km_wt))
# 
# ggplot(hexagon_sf_local_moran, aes(WEEKDAY_MORNING_TRIPS, trips_lag)) +
#   geom_point() +
#   geom_vline(xintercept = mean(hexagon_sf_local_moran$WEEKDAY_MORNING_TRIPS), linetype = "dashed", color = "red", size = 1.2) +
#   geom_hline(yintercept = mean(hexagon_sf_local_moran$trips_lag), linetype = "dashed", color = "red", size = 1.5) +
#   labs(title = "Scatterplot Example", x = "X-axis", y = "Y-axis")

Plot Local Moran’s

show code
# hexa_sig <- hexagon_sf_local_moran  %>%
#   filter(p_ii_sim < 0.05)
# 
# 
# tmap_mode("plot")
# tm_shape(sarb_sf) +
#   tm_polygons() + 
# tm_shape(hexagon_sf_local_moran) +
#   tm_fill("mean") +
#   tm_borders(alpha = 0.5)